Forecasting hierarchical and grouped time series

What is hierarchical and grouped time series?

Lets understand this with an example. We can have time series which have many other time series nested inside it, we categorize these into 3 categories

  • Hierarchical Time Series
  • Grouped time series
  • Mixed hierarchical and grouped structure

For example : Sale of Hybrid bikes can be divided into city, commuting, comfort, various types of bikes.

Hierarchical Time Series

  • Hierarchical happens when we want to split the data into city, state, country

yt=yAA,t+yAB,t+yAC,t+yBA,t+yBB,t

yA,t=yAA,t+yAB,t+yAC,t

yB,t=yBA,t+yBB,t

As we can see from the above equations, that if we sum up the bottom level of the hierarchy and compare it with a level up, it will be the same

Let us understand this with an example from Australian tourism This data has, quarterly domestic tourism demand, measured as the number of overnight trips Australians spend away from the home.

library('fpp3')
## ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
## ✔ tibble      3.1.8      ✔ tsibble     1.1.3 
## ✔ dplyr       1.0.10     ✔ tsibbledata 0.4.1 
## ✔ tidyr       1.2.1      ✔ feasts      0.3.0 
## ✔ lubridate   1.9.0      ✔ fable       0.3.2 
## ✔ ggplot2     3.4.0
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
tourism <- tsibble::tourism |>
  mutate(State = recode(State,
    `New South Wales` = "NSW",
    `Northern Territory` = "NT",
    `Queensland` = "QLD",
    `South Australia` = "SA",
    `Tasmania` = "TAS",
    `Victoria` = "VIC",
    `Western Australia` = "WA"
  ))

Now the function aggregate_key() can create hierarchical time series from bottom level ie, from states to the country

Let’s try it out

tourism_hts <- tourism |>
  aggregate_key(State / Region, Trips = sum(Trips))
tourism_hts
## # A tsibble: 6,800 x 4 [1Q]
## # Key:       State, Region [85]
##    Quarter State        Region        Trips
##      <qtr> <chr*>       <chr*>        <dbl>
##  1 1998 Q1 <aggregated> <aggregated> 23182.
##  2 1998 Q2 <aggregated> <aggregated> 20323.
##  3 1998 Q3 <aggregated> <aggregated> 19827.
##  4 1998 Q4 <aggregated> <aggregated> 20830.
##  5 1999 Q1 <aggregated> <aggregated> 22087.
##  6 1999 Q2 <aggregated> <aggregated> 21458.
##  7 1999 Q3 <aggregated> <aggregated> 19914.
##  8 1999 Q4 <aggregated> <aggregated> 20028.
##  9 2000 Q1 <aggregated> <aggregated> 22339.
## 10 2000 Q2 <aggregated> <aggregated> 19941.
## # … with 6,790 more rows
tourism_hts |>
  filter(is_aggregated(Region)) |>
  autoplot(Trips) +
  labs(y = "Trips ('000)",
       title = "Australian tourism: national and states") +
  facet_wrap(vars(State), scales = "free_y", ncol = 3) +
  theme(legend.position = "none")

tourism_hts |>
  filter(State == "NT" | State == "QLD" |
         State == "TAS" | State == "VIC", is_aggregated(Region)) |>
  select(-Region) |>
  mutate(State = factor(State, levels=c("QLD","VIC","NT","TAS"))) |>
  gg_season(Trips) +
  facet_wrap(vars(State), nrow = 2, scales = "free_y")+
  labs(y = "Trips ('000)")

From this graph we can understand, that in Queensland and Northern Territory, the tourism is on peak in Winter, In case of Southern states like Victoria and Tasmania tourism is high in Summer months

This plot shows us the things going onn within states and with some series showing strong trends or seasonality, some showing contrasting seasonality, while some series appear to be just noise.

Grouped time series

Now lets understand grouped time series This type of time series comes into place where is no general hierarchical trend.

For example : Business trips and Vacation Trips

yt=yAX,t+yAY,t+yBX,t+yBY,t

yA,t=yAX,t+yAY,t
yBX,t=yBX,t+yBY,t
yX,t=yAX,t+yBX,t

yY,t=yAY,t+yBY,t

If we total up for all the series the last ones, we get the same if we would have totaled up at one level up

Let us understand this with an example Lets take australian prison data. We have un-grouped this into - Gender - State - Legal Status

prison <- readr::read_csv("https://raw.githubusercontent.com/chirag35847/TS-images/main/prison_population.csv") |>
  mutate(Quarter = yearquarter(Date)) |>
  select(-Date)  |>
  as_tsibble(key = c(Gender, Legal, State, Indigenous),
             index = Quarter) |>
  relocate(Quarter)
## Rows: 3072 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): State, Gender, Legal, Indigenous
## dbl  (1): Count
## date (1): Date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Syntax for grouped time series is * * *

prison_gts <- prison |>
  aggregate_key(Gender * Legal * State, Count = sum(Count)/1e3)

Prisnors for the country

prison_gts |>
  filter(!is_aggregated(Gender), is_aggregated(Legal),
         is_aggregated(State)) |>
  autoplot(Count) +
  labs(y = "Number of prisoners ('000)")

Now we have the same for various states

prison_gts |>
  filter(!is_aggregated(Gender), !is_aggregated(Legal),
         !is_aggregated(State)) |>
  mutate(Gender = as.character(Gender)) |>
  ggplot(aes(x = Quarter, y = Count,
             group = Gender, colour=Gender)) +
  stat_summary(fun = sum, geom = "line") +
  labs(title = "Prison population by state and gender",
       y = "Number of prisoners ('000)") +
  facet_wrap(~ as.character(State),
             nrow = 1, scales = "free_y") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Mixed hierarchical and grouped structure

When the data is both nested and crossed, For example : Purpose of travel : holiday, business, visiting family etc The same can be also divided geographically in hierarchical.

We use aggregate_key and follow the following syntax. Now tourism_full contains 425 series, where 85 are hierarchical, and 340 are crossed

tourism_full <- tourism |>
  aggregate_key((State/Region) * Purpose, Trips = sum(Trips))

Lets look at the country’s purpose of traverlling

Now lets look at statwise purpose of travel

Single level approaches

Forecasts of hierarchical or grouped time series involved selecting one level of aggregation and generating forecasts for that level.

It involves

Selecting one level of aggregation and generating forecasts for that level.

Then for obtaining set of coherent forecasts for the rest of the structure these forecasts are

Bottom-up Approach

A simple method for generating coherent forecasts

This approach involves first generating forecasts for each series at the bottom level, and then summing these to produce forecasts for all the series in the structure.

y^AA,h,  y^AB,h,  y^AC,h,  y^BA,h  and  y^BB,h

Summing these, we get ℎ-step-ahead coherent forecasts for the rest of the series:

y~h=y^AA,h+y^AB,h+y^AC,h+y^BA,h+y^BB,h,y~A,h=y^AA,h+y^AB,h+y^AC,h,andy~B,h=y^BA,,h+y^BB,h

For this approach:

  • no information is lost due to aggregation

  • bottom-level data can be quite noisy and more challenging to model and forecast.

Example: Generating bottom-up forecasts

Aim - national and state forecasts for the Australian tourism data without

disaggregations using regions or the purpose of travel

tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key:       Region, State, Purpose [304]
##    Quarter Region   State Purpose  Trips
##      <qtr> <chr>    <chr> <chr>    <dbl>
##  1 1998 Q1 Adelaide SA    Business  135.
##  2 1998 Q2 Adelaide SA    Business  110.
##  3 1998 Q3 Adelaide SA    Business  166.
##  4 1998 Q4 Adelaide SA    Business  127.
##  5 1999 Q1 Adelaide SA    Business  137.
##  6 1999 Q2 Adelaide SA    Business  200.
##  7 1999 Q3 Adelaide SA    Business  169.
##  8 1999 Q4 Adelaide SA    Business  134.
##  9 2000 Q1 Adelaide SA    Business  154.
## 10 2000 Q2 Adelaide SA    Business  169.
## # … with 24,310 more rows
tourism_states <- tourism |>
  aggregate_key(State, Trips = sum(Trips))
tourism_states
## # A tsibble: 720 x 3 [1Q]
## # Key:       State [9]
##    Quarter State         Trips
##      <qtr> <chr*>        <dbl>
##  1 1998 Q1 <aggregated> 23182.
##  2 1998 Q2 <aggregated> 20323.
##  3 1998 Q3 <aggregated> 19827.
##  4 1998 Q4 <aggregated> 20830.
##  5 1999 Q1 <aggregated> 22087.
##  6 1999 Q2 <aggregated> 21458.
##  7 1999 Q3 <aggregated> 19914.
##  8 1999 Q4 <aggregated> 20028.
##  9 2000 Q1 <aggregated> 22339.
## 10 2000 Q2 <aggregated> 19941.
## # … with 710 more rows
fcasts_state <- tourism_states |>
  filter(!is_aggregated(State)) |>
  model(ets = ETS(Trips)) |>
  forecast()
# Sum bottom-level forecasts to get top-level forecasts
fcasts_national <- fcasts_state |>
  summarise(value = sum(Trips), .mean = mean(value))
tourism_states |>
  model(ets = ETS(Trips)) |>
  reconcile(bu = bottom_up(ets)) |>
  forecast()
## # A fable: 144 x 5 [1Q]
## # Key:     State, .model [18]
##    State  .model Quarter         Trips .mean
##    <chr*> <chr>    <qtr>        <dist> <dbl>
##  1 ACT    ets    2018 Q1  N(701, 7651)  701.
##  2 ACT    ets    2018 Q2  N(717, 8032)  717.
##  3 ACT    ets    2018 Q3  N(734, 8440)  734.
##  4 ACT    ets    2018 Q4  N(750, 8882)  750.
##  5 ACT    ets    2019 Q1  N(767, 9368)  767.
##  6 ACT    ets    2019 Q2  N(784, 9905)  784.
##  7 ACT    ets    2019 Q3 N(800, 10503)  800.
##  8 ACT    ets    2019 Q4 N(817, 11171)  817.
##  9 ACT    bu     2018 Q1  N(701, 7651)  701.
## 10 ACT    bu     2018 Q2  N(717, 8032)  717.
## # … with 134 more rows

Workflow for forecasting aggregation structures

data |> 
  aggregate_key() |> 
  model() |>
  reconcile() |> 
  forecast()
  1. Begin with a tsibble object (here labelled data) containing the individual bottom-level series.

  2. Define in aggregate_key() the aggregation structure and build a tsibble object that also contains the aggregate series.

  3. Identify a model() for each series, at all levels of aggregation.

  4. Specify in reconcile() how the coherent forecasts are to be generated from the selected models.

  5. Use the forecast() function to generate forecasts for the whole aggregation structure.

Top-down Approach

Top-down approaches involve first generating forecasts for the Total series yt, and then disaggregating these down the hierarchy.

Let p1,...,pm denote a set of disaggregation proportions which determine how the forecasts of the Total series are to be distributed to obtain forecasts for each series at the bottom level of the structure. For example, for the hierarchy of below figure, using proportions p1,...,p5, we get

\ytildeAAt=p1y^t,   \ytildeABt=p2y^t,   \ytildeACt=p3y^t,   \ytildeBAt=p4y^t   and      \ytildeBBt=p5y^t.

Top-down forecasts can be generated using top_down() within the reconcile() function

Top-down Methods

Average historical proportions

pj=1Tt=1Tyj,tyt

for j=1,,m. Each proportion reflects the average of the historical proportions of the bottom-level series yj,t over the period t=1,,T relative to the total aggregate yt.

This approach is implemented in the top_down() function by setting method = “average_proportions”

Proportions of the historical averages

pj=t=1Tyj,tT/t=1TytT

for j=1,...,m. Each proportion pj captures the average historical value of the bottom-level series yj,t relative to the average value of the total aggregate yt.

This approach is implemented in the top_down() function by setting method = “proportion_averages”.

Forecast proportions

Hierarchical Tree Diagram

pj==0K1y^j,h()S^j,h(+1).

where j=1,2,...,m,y^j,h() is the h-step-ahead initial forecast of the series that corresponds to the node which is levels above j, and S^j,h() is the sum of the h-step-ahead initial forecasts below the node that is levels above node j and are directly connected to that node. These forecast proportions dis-aggregate the h-step-ahead initial forecast of the Total series to get h-step-ahead coherent forecasts of the bottom-level series.

This approach is implemented in the top_down() function by setting method = “forecast_proportions”. It is the default method.

Advantage

  1. Simplicity: One only needs to model and generate forecasts for the most aggregated top-level series.
  2. In general, it produces quite reliable forecasts for the aggregate levels and they are useful with low count data.

Disadvantage

one disadvantage is the loss of information due to aggregation. we are unable to capture and take advantage of individual series characteristics such as time dynamics, special events, different seasonal patterns, etc.

Middle-out approach

The middle-out approach combines bottom-up and top-down approaches.

First, a “middle” level is chosen and forecasts are generated for all the series at this level. For the series above the middle level, coherent forecasts are generated using the bottom-up approach by aggregating the “middle-level” forecasts upwards. For the series below the “middle level”, coherent forecasts are generated using a top-down approach by disaggregating the “middle level” forecasts downwards.

Forecast Recolination

The equations we seen till now, we can represented using the matrix notation. The bottom-level series’ aggregation is determined by a n m matrix S, often known as the “summing matrix,” which is created for each aggregation structure.

Example1

we can represent the above example into the matrix as follows:

[ytyAtyBtyAAtyABtyACtyBAtyBBt]=[1111111100000111000001000001000001000001][yAAtyABtyACtyBAtyBBt]

or we can represent that into very small equation:

yt=Sbt

where y t is an n-dimensional vector of all the observations in the hierarchy at time t, S is the summing matrix, and b t is an m-dimensional vector of all the observations at the lowest level of the hierarchy at time t. The first row of the summation matrix S reflects

yt=yAAt+yABt+yACt+yBAt+yBBt,

, whereas the second and third rows represent

yAt=yAAt+yABt+yACtandyBt=yBAt+yBBt.
. The rows below these form an m-dimensional identity matrix I m, with each bottom-level observation on the right side of the equation equal to itself on the left side.

Similarly,for the given figure example 2

The given matrix is as follows:

[ytyAtyBtyXtyYtyAXtyAYtyBXtyBYt]=[111111000011101001011000010000100001][yAXtyAYtyBXtyBYt],
or the equation same as,

yt=Sbt

where the second and third rows of S represent Equation

yAt=yAXt+yAYtyBt=yBXt+yBYt
and the fourth and fifth rows represent
yXt=yAXt+yBXtyYt=yAYt+yBYt
.

This matrix notation enables us to use a single notation to express all forecasting methods for hierarchical or clustered time series.

Assume we anticipate all series without regard for any aggregation limitations. These are the basic predictions, and they are denoted by y^h, where h is the forecast horizon. They are stacked in the same sequence as the data yt.

Hence, for either hierarchical or grouped systems, all coherent forecasting techniques may be described as

y~h=SGy^h,

where G is a matrix that transfers the base forecasts to the bottom level and S sums them together using the aggregation structure to obtain a set of coherent forecasts y~h.

The G matrix is defined in accordance with the technique used. For example, if the bottom-up technique is applied to anticipate the example 1 hierarchy, then

G=[0001000000001000000001000000001000000001]
.

Take note that G has two partitions. The first three columns zero out the series’ base forecasts above the bottom level, while the m-dimensional identity matrix selects just the bottom level’s base forecasts. The S matrix then adds them all together.

If one of the top-down approach was adopted,

G=[p10000000p20000000p30000000p40000000p50000000].

The first column contains the set of proportions that allocate the top level’s base predictions to the lower level. The S matrix then adds them all together. The remaining columns subtract the base projections from the highest degree of aggregation.

The G matrix for a middle out strategy will be a hybrid of the two above. The base predictions of a pre-selected level will be disaggregated to the bottom level using a set of proportions, all other base forecasts will be wiped out, and the bottom-level forecasts will then be summed up the hierarchy using the summing matrix.


(11.7)y~h=SGy^h,

This equation as discussed above shows that SG will return a set of coherent forecasts.

Traditional Methods include the base forecasts from a single level of aggregation which can be considered as usage of limited information. n

However we need to optimize all the G matrices such that when SG combines and reconciles all the base forecasts in order to produce coherent forecasts.

The MinT optimal reconciliation approach

This is a way to find a G matrix which minimizes the total forecast variance of the set of coherent forecasts, leading to the MinT (Minimum Trace) optimal reconciliation approach.

Before, performing any steps ahead, we need to make sure that we have unbiased forecasts.

If y~h is unbiased then y^h is unbiased provided {SGS}={S}

IMPORTANT: No top-down method satisfies this constraint, so all top-down approaches result in biased coherent forecasts.

Finding errors: variance-covariance matrix of the h-step-ahead coherent forecast errors is given by

Vh=Var[yT+hy~h]=SGWhGS

where Wh=Var[(yT+hy^h)] is the variance-covariance matrix of the corresponding base forecast errors.
The objective is to find a matrix G that minimizes the error variances of the coherent forecasts. These error variances are on the diagonal of the matrix Vh , and so the sum of all the error variances is given by the trace of the matrix Vh.

The equation below shows that matrix G which minimizes the trace of Vh such that SGS=S , is given by

G=(SWh1S)1SWh1

Therefore, the optimally reconciled forecasts are given by

(11.8)y~h=S(SWh1S)1SWh1y^h

MinT is implemented by min_trace() within the reconcile() function.
We need to estimate Wh, the forecast error variance of the h-step-ahead base forecasts. This can be difficult, and so we provide four simplifying approximations that have been shown to work well in both simulations and in practice. Lets four simplifying approximations that have been shown to work well in both simulations and in practice.

  1. Set Wh=khI for all h, where kh>0. This is the most simplifying assumption to make, and means that G is independent of the data, providing substantial computational savings. The disadvantage, however, is that this specification does not account for the differences in scale between the levels of the structure, or for relationships between series.

    Setting Wh=khI in gives the ordinary least squares (OLS) estimator we introduced in with X=S and y=y^. Hence this approach is usually referred to as OLS reconciliation. It is implemented in min_trace() by setting method = “ols”.

  2. Set Wh=khdiag(W^1) for all h, where kh>0,

    W^1=1Tetet

    and et is an n-dimensional vector of residuals of the models that generated the base forecasts stacked in the same order as the data. It referred as WLS (weighted least squares) estimator using variance scaling beacuase this specification scales the base forecasts using the variance of the residuals. The approach is implemented in min_trace() by setting method = “wls_var”.

  3. Set Wh=khΛ for all h, where kh>0, Λ=diag(S1), and 1 is a unit vector of dimension m(the number of bottom-level series). It is referred to as structural scaling because this estimator only depends on the structure of the aggregations, and not on the actual data. The approach is implemented in min_trace() by setting method = “wls_struct”.

  4. Set Wh=khW1 for all h, where kh>0 Here we only assume that the error covariance matrices are proportional to each other, and we directly estimate the full one-step covariance matrix W1. The most obvious and simple way would be to use the sample covariance. This is implemented in min_trace() by setting method = “mint_cov”.

However, for cases where the number of bottom-level series m is large compared to the length of the series T , this is not a good estimator. Instead we use a shrinkage estimator which shrinks the sample covariance to a diagonal matrix. This is implemented in min_trace() by setting method = “mint_shrink”.

The best reconciliation projections are produced using all the data available inside a hierarchical or grouped framework, unlike any other existing method. This is crucial because some aggregation levels or groupings may highlight data characteristics that are interesting to the user and crucial for modelling.

11.4 Forecasting Australian domestic tourism

We will compute forecasts for the Australian tourism data . We use the data up to the end of 2015 as a training set, withholding the final two years (eight quarters, 2016Q1–2017Q4) as a test set for evaluation. The code below demonstrates the full workflow for generating coherent forecasts using the bottom-up, OLS and MinT methods.

The accuracy of the forecasts over the test set can be evaluated using the accuracy() function. We summarise some results using RMSE and MASE.

tourism_full <- tourism |>
  aggregate_key((State/Region) * Purpose, Trips = sum(Trips))
fit <- tourism_full |>
  filter(year(Quarter) <= 2015) |>
  model(base = ETS(Trips)) |>
  reconcile(
    bu = bottom_up(base),
    ols = min_trace(base, method = "ols"),
    mint = min_trace(base, method = "mint_shrink")
  )
fc <- fit |> forecast(h = "2 years")
fc |>
  filter(is_aggregated(Region), is_aggregated(Purpose)) |>
  autoplot(
    tourism_full |> filter(year(Quarter) >= 2011),
    level = NULL
  ) +
  labs(y = "Trips ('000)") +
  facet_wrap(vars(State), scales = "free_y")

fc |>
  filter(is_aggregated(State), !is_aggregated(Purpose)) |>
  autoplot(
    tourism_full |> filter(year(Quarter) >= 2011),
    level = NULL
  ) +
  labs(y = "Trips ('000)") +
  facet_wrap(vars(Purpose), scales = "free_y")

fc |>
  filter(is_aggregated(State), is_aggregated(Purpose)) |>
  accuracy(
    data = tourism_full,
    measures = list(rmse = RMSE, mase = MASE)
  ) |>
  group_by(.model) |>
  summarise(rmse = mean(rmse), mase = mean(mase))
## # A tibble: 4 × 3
##   .model  rmse  mase
##   <chr>  <dbl> <dbl>
## 1 base   1721.  1.53
## 2 bu     3070.  3.16
## 3 mint   2158.  2.09
## 4 ols    1804.  1.63

11.5 Reconciled distributional forecasts

Reconciled distributional forecasts are a type of probabilistic forecasting that involve reconciling or combining multiple individual forecasts, each with their own distributional assumptions, into a single coherent forecast with a distributional assumption that reflects the information from all the individual forecasts.

we will focus on two fundamental results that are implemented in the reconcile() function.

  1. If the base forecasts are normally distributed,

$

y^h~  N(µ^h,^h)
$

then the reconciled forecasts are also normally distributed,

$

y^h~  N(SGµ^h,SG^hGS)
$

where, S = Summing matrix

G = Reconciliation matrix

  1. When the assumption of normality is not reasonable for the base forecasts:

    we can use a non-parametric approach such as bootstrapping to generate a large number of possible future sample paths from the model(s) that produce the base forecasts. These sample paths represent a range of possible future outcomes based on the underlying model(s) and provide a way to estimate the distribution of the forecast errors.

    Once the sample paths are generated, we can then reconcile them using a reconciliation method such as hierarchical forecasting or Bayesian model averaging. This involves adjusting the individual sample paths to ensure that they are consistent with each other and with any known constraints or relationships between the variables being forecasted.

    After the sample paths are reconciled, we can compute coherent prediction intervals that reflect the uncertainty in the reconciled forecast. One way to do this is to compute quantiles of the reconciled sample paths at each future point in time.

    For example, if we generate 1,000 reconciled sample paths, we can compute the 2.5th and 97.5th percentiles of the sample paths at each future point to obtain a 95% prediction interval.

    This approach allows us to account for the non-normality of the forecast errors and to incorporate the information from the base forecasts into a single, coherent forecast that reflects the collective knowledge and information of the individual forecasts.

11.6 Forecasting Australian prison population

Returning to the Australian prison population data (Section 11.1), we will compare the forecasts from bottom-up and MinT methods applied to base ETS models

prison <- readr::read_csv("https://raw.githubusercontent.com/chirag35847/TS-images/main/prison_population.csv") |>
  mutate(Quarter = yearquarter(Date)) |>
  select(-Date)  |>
  as_tsibble(key = c(Gender, Legal, State, Indigenous),
             index = Quarter) |>
  relocate(Quarter)
## Rows: 3072 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): State, Gender, Legal, Indigenous
## dbl  (1): Count
## date (1): Date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
prison_gts <- prison |>
  aggregate_key(Gender * Legal * State, Count = sum(Count)/1e3)
fit <- prison_gts |>
  filter(year(Quarter) <= 2014) |>
  model(base = ETS(Count)) |>
  reconcile(
    bottom_up = bottom_up(base),
    MinT = min_trace(base, method = "mint_shrink")
  )
fc <- fit |> forecast(h = 8)
fc |>
  filter(is_aggregated(State), is_aggregated(Gender),
         is_aggregated(Legal)) |>
  autoplot(prison_gts, alpha = 0.7, level = 90) +
  labs(y = "Number of prisoners ('000)",
       title = "Australian prison population (total)")

fc |>
  filter(
    .model %in% c("base", "MinT"),
    !is_aggregated(State), is_aggregated(Legal),
    is_aggregated(Gender)
  ) |>
  autoplot(
    prison_gts |> filter(year(Quarter) >= 2010),
    alpha = 0.7, level = 90
  ) +
  labs(title = "Prison population (by state)",
       y = "Number of prisoners ('000)") +
  facet_wrap(vars(State), scales = "free_y", ncol = 4) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

fc |>
  filter(is_aggregated(State), is_aggregated(Gender),
         is_aggregated(Legal)) |>
  accuracy(data = prison_gts,
           measures = list(mase = MASE,
                           ss = skill_score(CRPS)
           )
  ) |>
  group_by(.model) |>
  summarise(mase = mean(mase), sspc = mean(ss) * 100)
## # A tibble: 3 × 3
##   .model     mase  sspc
##   <chr>     <dbl> <dbl>
## 1 base      1.72   55.9
## 2 bottom_up 1.84   33.5
## 3 MinT      0.895  76.8